Accelerating high-dimensional clustering with lossless data reduction

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Lossless Data Compression with GPUs

Huffman compression is a statistical, lossless, data compression algorithm that compresses data by assigning variable length codes to symbols, with the more frequently appearing symbols given shorter codes than the less. This work is a modification of the Huffman algorithm which permits uncompressed data to be decomposed into independently compressible and decompressible blocks, allowing for co...

متن کامل

Adaptive dimension reduction for clustering high dimensional data

It is well-known that for high dimensional data clustering, standard algorithms such as EM and the K-means are often trapped in local minimum. Many initialization methods were proposed to tackle this problem , but with only limited success. In this paper we propose a new approach to resolve this problem by repeated dimension reductions such that K-means or EM are performed only in very low dime...

متن کامل

High-dimensional data clustering

Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that highdimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for highdimensional data which combine the ideas of subspace c...

متن کامل

Subspace Clustering of High Dimensional Data

Clustering suffers from the curse of dimensionality, and similarity functions that use all input features with equal relevance may not be effective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction tec...

متن کامل

The Challenges of Clustering High Dimensional Data

Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, or as a means of data compression. While clustering has a long history and a large number of clustering techniques have been developed in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bioinformatics

سال: 2017

ISSN: 1367-4803,1460-2059

DOI: 10.1093/bioinformatics/btx328